Fast and Parallel Mining of K High Utility Item Set
نویسنده
چکیده
---A large number of contributions in the literature have been proposed for item set mining, exploring various measures according to the chosen relevance criteria. However, items are actually different in many aspects in a number of real applications, such as retail marketing, network log, etc. The difference between items makes a strong impact on the decision making in these applications. Therefore, traditional ARM cannot meet the demands arising from these applications. By considering the different values of individual items as utilities, parallel mining focuses on identifying the itemsets with high utilities.The parallel mining of high utility itemsets will take very less time than mining with the single system over large number of transactions. The most studied measure is probably the number of frequent item sets processed in import and export business process. While the problem has been widely studied, only few solutions scale. This is particularly the case when i) the data set is massive, calling for large-scale distribution, ii) the length kk of the informative item set to be discovered is high and/or iii) the data are dynamic. In this paper, we address the problem of parallel mining of large informative kk-High Utility item sets (liki) based on joint entropy. We propose advanced FPHIKS (Fast Parallel Highly Informative kk-High Utility item sets) a highly scalable, parallel liki mining algorithm and forward selection algorithm. FPHIKS renders the mining process of large scale databases (up to double or treble terabytes of data) succinct and effective. Its mining process is made up of only two efficient parallel jobs. With FPHIKS, we provide a set of significant optimizations for calculating the joint entropies of liki having different sizes, which drastically reduces the execution time of the mining process.
منابع مشابه
Efficient Utility Based Infrequent Weighted Item-Set Mining
Association Rule Mining (ARM) is one of the most popular data mining techniques. Most of the past work is based on frequent item-set. In current years, the concentration of researchers has been focused on infrequent item-set mining. The infrequent item-set mining problem is discovering item-sets whose frequency of the data is less than or equal to maximum threshold. This paper addresses the min...
متن کاملA Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI
Classical frequent itemset mining identifies frequent itemsets in transaction databases using only frequency of item occurrences, without considering utility of items. In many real world situations, utility of itemsets are based upon user’s perspective such as cost, profit or revenue and are of significant importance. Utility mining considers using utility factors in data mining tasks. Utility-...
متن کاملUtility of Complex Alternatives in Multiple-Choice Items: The Case of All of the Above
This study investigated the utility of all of the above (AOTA) as a test option in multiple-choice items. It aimed at estimating item fit, item difficulty, item discrimination, and guess factor of such a choice. Five reading passages of the Key English Test (KET, 2010) were adapted. The test was reconstructed in 2 parallel forms: Test 1 did not include the abovementioned alternative, whereas Te...
متن کاملUsing a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)
In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...
متن کاملStudy on High Utility Itemset Mining
Data mining is the process of mining new non trivial and potentially valuable information from large data basis. Data mining has been used in the analysis of customer transaction in retail research where it is termed as market basket analysis. Earlier data mining methods concentrated more on the correlation between the items that occurs more frequent in the transaction. In frequent itemset mini...
متن کامل